Support Vector Machine Parameter Optimization for Text Categorization Problems
نویسندگان
چکیده
This paper analyzes the influence of different parameters of Support Vector Machine (SVM) on text categorization performance. The research is carried out on different text collections and different subject headings (up to 1168 items). We show that parameter optimization can essentially increase text categorization performance. An estimation of range for searching optimal parameter is given. We describe an algorithm to find optimal parameters. We introduce the notion of stability of classification algorithm and analyze the stability of SVM, depending on number of documents in the example set. We suggest some practical recommendations for applying SVM to real-world text categorization problems.
منابع مشابه
Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملText Categorization Problem
Document categorization problem gained a lot of importance in the last years due to the increase in the number of digital documents. This paper analyzes the performance of different classification algorithms on text categorization problem. Importance of parameter optimization on the performance of the algorithms is also discussed. The paper mostly focuses on the SVM (Support Vector Machines) al...
متن کاملAn Improved Feature Selection Method in Chinese Text Categorization
This paper reports a method which has improved performance in feature selection in Chinese Text Categorization. The paper first uses the improved information gain (IG) to select the initial features, and experiments with two methods for feature selection. The first method chooses the top n features of all classes as vector space, while the second method chooses the top k features for each class...
متن کاملText Categorization and Support Vector Machines
Text categorization is used to automatically assign previously unseen documents to a predefined set of categories. This paper gives a short introduction into text categorization (TC), and describes the most important tasks of a text categorization system. It also focuses on Support Vector Machines (SVMs), the most popular machine learning algorithm used for TC, and gives some justification why ...
متن کاملUniversit at Dortmund Fachbereich Informatik Lehrstuhl Viii K Unstliche Intelligenz Text Categorization with Support Vector Machines: Learning with Many Relevant Features Text Categorization with Support Vector Machines: Learning with Many Relevant Features
This paper explores the use of Support Vector Machines (SVMs) for learning text classiers from examples. It analyzes the particular properties of learning with text data and identi es, why SVMs are appropriate for this task. Empirical results support the theoretical ndings. SVMs achieve substantial improvements over the currently best performing methods and they behave robustly over a variety o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003